Linked Servers

10/24/2010 4:49:53 PM

Linked servers enable SQL Server–based applications to include most any other type of data source to be part of a SQL statement execution, including being able to directly reference remote SQL servers. They also make it possible to issue distributed queries, updates, deletes, inserts, commands, and full transactions on heterogeneous data sources across your entire company (network). SQL Server essentially acts as the master query manager. Then, via OLE DB providers and OLE DB data sources, any compliant data source is easily referenced from any valid SQL statement or command. For each data source, either they are directly referenced, or SQL Server creates provider-specific subqueries issued to a specialized provider. This is very close to being a federated data management capability across most heterogeneous data sources.
Unlike remote servers, linked servers have two simple setup steps:

1.
Define the remote server on the local server.

2.
Define the method for mapping remote logins on the local server.

All linked server configurations are performed on the local server. The mapping for the local user to the remote user is stored in the local SQL Server database. In fact, you don’t need to configure anything in the remote database. Using linked servers also allows SQL Server to use OLE DB to link to many data sources other than just SQL Server.

OLE DB is an API that allows COM/.NET applications to work with databases as well as other data sources, such as text files and spreadsheets. This capability lets SQL Server have access to a vast amount of different types of data as if these other data sources were local SQL Server tables or views. This is extremely powerful.

Unlike Remote Procedure Calls (and remote servers only), linked servers also allow distributed queries and transactions.

Truly a Linked Server

Keep in mind that when you define linked servers, SQL Server really keeps these data resources linked in many ways. Most importantly, it keeps the schema definitions linked. In other words, if the schema of a remote table on a linked server changes, any server that has links to it also knows the change (that is, gets the change). Even when the linked server’s schema comes from something such as Excel, if you change the Excel spreadsheet in any way, that change is automatically reflected back at the local SQL Server that has defined that Excel spreadsheet. This is extremely significant from a metadata and schema integrity point of view. This is what is meant by “completely linked.”

Distributed Queries

Distributed queries access data stored in OLE DB data sources. SQL Server treats these data sources as if they contained SQL Server tables. Basically, via a provider such as OLE DB, the data source is put in terms of recordsets. Recordsets are the way SQL Server needs to see any data. The Microsoft SQL Native Client OLE DB provider (with PROGID SQLNCLI) is the official OLE DB provider for SQL Server 2008. You can view or manipulate data through this provider by using the same basic Data Manipulation Language (DML) syntax as for T-SQL for SQL Server (SELECT, INSERT, UPDATE, or DELETE statements). The main difference is the table-naming convention. Distributed queries use a four-part table name syntax for each data source as follows:

linked_server_name.catalog.schema.object_name

The following distributed query accesses data from a sales table in an Oracle database, a region table in a Microsoft Access database, and a customer table in a SQL Server database—all with a single SQL statement:

SELECT s.sales_amount
FROM access_server...region AS r,
oracle_server..sales_owner.sale AS s,
sql_server.customer_db.dbo.customer AS c
where r.region_id=s.region_id
and s.customer_id=c.customer_id
and r.region_name='Southwest'
and c.customer_name='ABC Steel'

All these data sources are on completely different physical machines. But with linked servers and distributed queries, you might not ever realize this.

Distributed Transactions

With SQL Server distributed transactions, it is now possible to manipulate data from several different data sources in a single transaction. Distributed transactions are supported if the OLE DB provider has built in the XA transactional functionality. For example, suppose two banks decide to merge. The first bank (let’s call it OraBank) stores all checking and savings accounts in an Oracle database. The second bank (let’s call it SqlBank) stores all checking and savings accounts in a SQL Server 2008 database. A customer has a checking account with OraBank and a savings account with SqlBank. What would happen if the customer wanted to transfer $100 from the checking account to the savings account? You can handle this task by using the following code while maintaining transactional consistency:

BEGIN DISTRIBUTED TRANSACTION
-- One hundred dollars is subtracted from the savings account.
UPDATE oracle_server..savings_owner.savings_table
 SET account_balance = account_balance - 100
WHERE account_number = 12345
-- One hundred dollars is added to the checking account.
UPDATE sql_server.checking_db.dbo.checking_table
 SET account_balance = account_balance + 100
WHERE account_number = 98765
COMMIT TRANSACTION;

The transaction is either committed or rolled back on both databases.